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ABSTRACT 

Disk self-gravity could play an important role in the dynamic evolution of 
interaction between disks and embedded protoplanets. We have developed a fast 
and accurate solver to calculate the disk potential and disk self-gravity forces for 
disk systems on a uniform polar grid. Our method follows closely the method 
given by Chan et al. (2006), in which an FFT in the azimuthal direction is 
performed and a direct integral approach in the frequency domain in the radial 
direction is implemented on a uniform polar grid. This method can be very 
effective for disks with vertical structures that depend only on the disk radius, 
achieving the same computational efficiency as for zero-thickness disks. 

We describe how to parallelize the solver efficiently on distributed parallel 
computers. We propose a mode-cutoff procedure to reduce the parallel commu- 
nication cost and achieve nearly linear scalability for a large number of processors. 
For comparison, we have also developed a particle-based fast tree-code to calcu- 
late the self-gravity of the disk system with vertical structure. The numerical 
results show that our direct integral method is at least two order of magnitudes 
faster than our optimized tree-code approach. 
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Introduction 



Well before extrasola r planets were discovered, iGoldreich Sz Tremaind (119791 . Il98dl ) and 
Lin fc Papaloizoul ( 1986al lbl) have speculated that t idal interacti ons between disks and embed- 



ded protoplanets would lead to planet migration 



types of migration could occur. iNelson fc Bend (12003a 



p. 

N 



Ward 



19971 ) suggested that two different 
bj) studied numerica lly the effects of 



disk sel f-gravity in two-dimensional simulations of planet-disk interactions. 



Pierens &: Hure 



( ]2005al ) reported by an analytical deri vation that the disk g ravity accelerates the planetary 



migration. Recently, simulations by iBaruteau fc Massetl (120081 ) confirmed that the self- 
gravity indeed accelerates the type I migration. They impleme nted a 2D self-gravity solve r 
in their code using the fast Fourier transform (FFT) method of iBinney fc Tremaind (119871 ). 
which requires a logarithmic radial spacing (log(r)). 

In Newtonian gravity, we can define the gravitational potential \1/ associated with the 
mass density, p, by the volume integral 

P(x') 



* x 



-G 




-d 3 x' 



over all space, where G is the gravitational constant. Rewriting Eq. ([T]) in differential form, 
we obtain Poisson's equation 

V 2 ^ = AirGp, (2) 

with \1/ satisfying the boundary condition \I/(oo) = at all times. The numerical "Poisson 
solvers" to Eq. ([2]) can be classified into two categories: difference methods and integral 
methods. The difference methods solve Eq. (J2J) directly by either finite-difference or finite- 
element methods with necessary boundary conditions. The known boundary conditions at 
infinity are usually not very useful if a finite domain is considered. User specified boundary 
conditions or Dirichlet boundary conditions obtained by direct summation are often required. 
A key advantage of difference methods is that, generally, they are relatively fast once ini- 
tialized. However, they have low or limited accuracy, and they often rely on the integral 
method to provide the boundary conditions. 

The integral methods are to integrate Eq. ([T]) directly. They have the advantage that 
the summation stops naturally at the domain boundaries. However an integral method 
has two difficulties in a practical implementation. One is that the integral has a point 
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mass singularities (i.e. when x' — > x in Eq. ([I])), which is often circumvented by introducing 
softening. The other difficulty is that they are computationally prohibitive for a large system. 
The computational cost can be significantly lowered if the FFT method can be used. 

The gravitational field, g = — V^, is more convenient in many applications. It can be 
calculated via numerical derivatives, which generally have poor precision. To obtain high 
accuracy, we can integrate the field directly by 

s(xHG ///^r dV • (3) 

which shares the same two difficulties as calculating Eq. ([T]). 

In this paper, we present a method for computing the disk self-gravity for quasi-2D disk 
models. We consider a disk with the cylindrical grid (r,(f>,z). We assume that the scale 
height of the disk (semi-thickness) is only radius-dependent, i.e., H = H(r), and it is quite 
small, H(r)/r <C 1 (namely geometrically thin disks). We also assume that the vertical 
structure of the density can be described by some function Z(r, z) that is independent of (p 
and time t, i.e., 

pit, r, 0, z) = E(t, r, <f>)Z{r, z). 

In this paper, we are particularly interested in the gravitational potential and field force at 
the z = plane. Although our method is valid for any function of Z(r, z), we assume that 
the density vertically has a Gaussian distribution. Under this assumption, 

Z(r, z) = , 1 exp ( . (4) 

With the presence of the ver t ical s tructure (jlj), the migration rate of the protoplanet is 
reduced up to 50% (see iKollerl (120041 ) ). We assume that the disk has a constant sound speed 
c s . Then from the scale height 

H = c^ 
r v K 

where vk is the Keplerian velocity Vk{t) = \J GM*/r, we obtain 



H(r) 



c r 3/2 



Other profiles for the scale height are also easy to handle. For example, if the aspect ratio, 
h = H/r, is constant, then H(r) = rh. The numerical verification in factually uses a 
constant H in the whole domain. 
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Since we are interested in the potential and field in the z = plane, the easiest method 
to obtain \I/ is to integrate directly the equation 

■-■2k 



X(ty,<j>')r'dr'c 



GZ(r', z') 



\/r 2 + 



j2 



2rr'cos( 



-,/2 



dz' (5) 



#(t,r,0) = 

where r min and r max are the i n ner a nd outer radial boundaries, and £ is the vertically 
integrated density. As in iHurel (120051 ) , we have employed two different coordinate systems 
in the radial direction: r as the field grid and r' as the source grid. For simplicity, we set 
G = 1 and M* = 1. 

The rest of th e paper is organized as follows. In §0 we describe the method given by 
(hood ) on how to solve Eq. ([5]) using a direct integ ral via a Green's function 

Chan et all (12006k differs 



Chan et al 



method. Our method, though essentially follows the one given by 
in that we use a direct summation method on a uniform grid in the radial direction, which is 
the same grid used in our hydro code. In addition, we present two approaches to circumvent 
the singularity in Eq. (Q and two methods to calculate the force field. For comparison, we 
have also implemented a 2D tree-code with a simplified 3D treatment to calculate the self- 
gravity for disks with vertical structures. In §|3]we present an efficient parallel implementation 
scheme on distributed memory computers. We describe a new algorithm to calculate the 
gravity force at an arbitrary point. We also present numerical test results to compare different 
approaches. A few concluding remarks are given in §HJ 



Green's Function Method 



We present here a numerical method to com pute integral ([5]). For zero -thickness non- 
axisymmetric disk, the classical FFT method fe.g.lBinnev fe Tremaindll987l) based on polar 
grid, which has b een implemented by iBaruteau fc Massetl (120081 ). is often used. However, 



as pointed out by iHure fc Pierensl (120051 ). the FFT method has a few drawbacks. First it 
requires a grid with a logarithmic radial spacing, which could be inconvenient to most hydro- 
solvers using uniform spacing. Secondly, to avoid the well-known alias issue, the FFT method 
requires to double the number of cells along the radial direction. The FFT calculation is 
thus done on a grid that has twice the extent of the hydrodynamic grid along the radial 
direction, which induces some complications in the calculation of the convolution kernels, as 



- 5 - 



well as many communications between both grids. Furthermore, with the presence of vertical 
structure, it is impossible to apply the FFT method in 3D cylindrical coordinates because 
neither the potential nor the gravitational field can be represent ed as convolution products 



in z. There is no such coordinate transformation as described in 
to make the integral become a convolution product in all coordinates. 



Binney fc Tremaind (119871 ) 



The Green's function method given by 



Chan et al. 



(120061 ) avoids the FFT in the radial 



direction. Instead, a pseudo-spectral method on a scaled cosine radial grid is used to achieve 
the high-order accuracy. Moreover, a known, time-independent vertical structure is easy to 
incorporate. In the following, we describe modifications to their method so that it can be 
applied directly to a uniform radial grid. 



2.1. Modified method given by I Chan et a.1.1 (120061 ) 



Chan et al.l (120061 ) proposed a direct integral method to solve Eq. (JSJ). For the sake of 
completeness, we recap the key steps in their method here. First, we introduce a softening 
e to Eq. (j5J) and denote 



g( r y, </>-</>') 



Z(r',z' 



r 2 + r' 2 — 2rr'cos( 



e 2 + z' 2 



-.dz' 



(6) 



where e can be zero or a radius- dependent parameter that will be discussed later in Section 
12.21 Function Q(r, r', — 0') can be computed either analytically or by numerical quadrature. 
For the special case of Z(r, z) defined by d3J, 



Q(r, r', 



2 / 4 K (R 2 /4) 



2ttH r' 



where R 2 = (r 2 + r' — 2rr'cos(0 — 0') + e 2 )/H 2 (r'), and K denotes the modified Bessel 
function of the second kind. Let 



J(r, r', - 0') = 2vrr'^(r, r', <f> -</>'). 



(7) 



Then Eq. (jSJ) becomes 



2- 



— / H(t,r',(j)')I(r,r',(j) — (j)')dr'd(j)' 
2vr Jo 
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Note that both £ and I are periodic functions with period 2ir, and they are in a natural 
convolution representation in Eq. (jSJ). Applying Fourier transform to (jSJ) with respect to 
(w.r.t.) 0, and using the convolution theorem, we obtain 



£ m (t, r')I m (r, r')dr', m e [-00, +00], 



(9) 



where f m represents the coefficients in the Fourier series expansion of /, which is given as 

r-27T 



1 

2^ 



f(<P)e- im<t> d<P . 



Chan et al. 



fl2006h 



Note that our equation (Q is slightly different from the equation (39) in 
since we have put all the relevant coefficients in the representation of / in Eq. ([7]). 

The integral over r in Eq. (ED wa s evalu ated using Chebyshev spectral method on a 
"Chebyshev-roots grid" by 



Chan et al. 



(120061 ). However, for most hydro codes that use a 
uniform grid, it is desirable to use the same uniform grid for both hydro and self-gravity 
solutions to avoid interpolation between the hydro solver and self-gravity solver (which is 
inconvenient and introduces interpolation error). 

We propose to integrate (Q directly with numerical quadrature using the available 
discrete values of £ m and I m on a uniform grid. Note that 7(r, r', — 0') has an analytic 
expression and i s not changed with time as long as the vertical structure remains the same. 



As proposed by I Chan et al.l (120061 ). we can pre-compute its Fourier transform, I m (r,r'), to 
speed up the algorithm. The whole algorithm can be summarized as follows: 

1. For each pair (r, r'), we pre-compute 7 m (r, r'). Take the Fourier transform of J(r, r', <fi — 
0'), which is defined in Eq. (jTj), w.r.t. <f) — 0', 

J m (r, r>) = FFT(J(r, r\ - 0')), m = 1, 2, N+, 

where is the number of cells in 0-direction. Since J(r, r, — 0') is a real and even 
function w.r.t. — 0', only the discrete cosine transform is needed and modes 
need to be stored. 



2. Take the Fourier transform of S(r', 0') w.r.t. 0' to obtain E m (r'), m = 1, 2, A^. 
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3. Calculate Eq. (Q by numerical quadrature in the radial direction to obtain ^f m (r). 
Either the midpoint or trapezoidal rule can be used, depending on the grid point 
distribution of the source grid. 

4. Take the inverse Fourier transform of \l/(r) w.r.t. <p to obtain \I/(r, <fi). 

We should emphasize that the pre-computing of J m (r, r') plays a major role in the efficiency of 
our self-gravity solver. This step can be calculated once for all during a long time simulation 
of the disk-planet interaction system. We find that it costs at least fifty times more than the 
rest of steps. One cannot afford to calculate it in every time step. 

The overall computational cost is O^Nj-N^logN^ + N^N^), where N r is th e number of 
cells i n the radial direction. Again, these steps are essentially the same as in 



Chan et al. 



(120061 ). except that the integration is done on a uniform polar grid using direct summation. 



2.2. Treatment of singularity 

When e — 0, the integrand in 



L)q. (Bp contains a singularity. Note that the density 
splitting method of lPierens fc Hurel (l2005bT ) cannot be used in our case because the density 



profile is unknown during the pre-computing of /„ 



r, r 



We have tested two approaches to solve the singularity issue. One approach is to use 
a nonzero softening, e(r'). We propose a function that scales roughly linearly with Ar, i.e., 
e(r') = a(r')Ar. The idea is that the mass distribution from the small spheres at each cell 
center should approximate a smooth mass distribution instead of a sum of (^-functions. We 
optimize a(r') by minimizing the relative error in \l/(r, 0) for sharply peaked Gaussian mass 
distributions located at r' ranging from 0.6 to 1.8 (scaled value for the disk-planet problem) 
with c s = 0.05. We find a piecewise linear function for a(r): 





0.17 + 


0.06 
0.6 


[r — 


'"min) ; 


r < 1.0 






0.23 + 


0.03 
0.2 


[r — 


1.0), 


1.0 < r 


< 1.2 


r) = < 


0.26 + 


0.04 
0.3 


[r — 


1.2), 


1.2 < r 


< 1.5 




0.30 + 


0.03 
0.2 


[r — 


1.5), 


1.5 < r 


< 1.7 




0.33 + 


0.04 
0.3 


[r — 


1.7), 


1.7 < r 





(10) 
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Our exper iments show that a ( r) is fairly insensitive to different resolutions in the radial 



direction. 



Baruteau Sz Masset 



(120081 ). however, have argued that the softening must scale 
with r. This might be because the logarithmic grid in the radial direction is used in their 
FFT implementation. For the uniform grid along the radial direction, we find that even a 
constant a, e.g., a(r) = 0.23, yields good results for the density peak near r = 1. 

We remark that our choice of the softening should not be mixed with th e softening that 
i s used for calculating the gravity from the planet. It has been pointed out by 



Nelson fc Benz 



f l2003al ) that the softening use d between the disk and pla net should be at least 0.75 times the 



Baruteau fc Masset 



(120081 ) suggested using e(r) = 0.3H(r) for 



physical size of the grid zone, 
both disk self-gravity and the gravity between disk cells and planet. We find that this choice 
is too big and produces large error for disk self-gravity in our implementation. Therefore 
we use two different softenings in our simulations: for the disk self-gravity e = a(r)Ar, and 
for the gravity between disk and planet e = 0.1H(r p ) (r p is the planet position in radial 
direction). The coefficient 0.1, which is still much smaller than the values generally used in 



the literature, is c hosen partial 
very well with the 



Tanaka et al. 



y due to the presence of vertical structure, and it matches 
(120021 ) theory on the torque evaluation. 



A question arises that the different softening choices may introduce inconsistency be- 
tween the disk and planet. This concern can be alleviated by the fact that the gravity 
force for disk cells near the planet are dominated by the planet gravity. Therefore this 
inconsistency does not have much impact on the dynamics of the disk. 

To eliminate the dependence on the softening, we have implemented the other approach, 
which is to us e different g r ids fo r r and r' so that r[ is never equal to rj. This approach has 
been used by 



Chan et al 



(120061 ) for their spectral method. In our case, the integral of Eq. 
© is performed on a fixed uniform grid of r' . Because r' is the cell-centered grid, we define 
r as the node centered grid, i.e., 



r 



r\ - 0.5Ar, 



r' i + 0.5Ar,i = 1,2,...,JV. 



(li; 



Note that the r grid has one more points than the r' grid. After calculating Q(r, r', — 0') for 
r at the node-center, we can obtain the Q(r, r', — 0') for r at the cell-center by interpolation, 
and then calculate the potential \l/(r, 0) at the cell-center. We can also calculate directly the 
potential \l/(r, 0) with r at the node center for use of the field calculation. 
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2.3. Field Calculation 

We have implemented two approaches to calculate the field components. The first 
approach is the difference method: we calculate the potential first and then use the finite- 
difference method to calculate the field. 

If the potential is calculated at the cell-center (rj,0j), we use the central-difference to 
obtain the field: 

^ (riAj) = ^M^rlM (12) 



where the notation (•)<£ = d(cdot) / (rdcj)). Note that we need values of \&(ro, (ft) and \&(rjv+i, <fi) 
for the $ r component at the boundary cells located at r\ and r^; otherwise, one-side finite- 
difference, which is not very accurate, should be used. 

If the potential is calculated at the edge-center (r i+ i, (f)j), the field can be calculated as 
*(r i+ i,0j) - ^(rvi,^) 

MnAj) = 2 Ar 2 (14) 

T , A , ^(r lH Aj+i) + ^(r^Ani) -^(r^ h Aj~i) , . 
**{rM = ' >- • (15) 

High order methods that use wide stencils are also possible. In fact, we can obtain the 
derivative ^ with only one extra FFT in the above algorithm of calculating the potential. 
After obtaining \l/ m (r), we take the inverse transformation of —im^> m (r) to get the ^(f,. 

The second approach for the field calculation is the integral method: we directly con- 
volute the field Green's function with the mass distribution. The differentiation in \l/ r and 
^0 can be applied directly to the Green function, which yields 

* r (r,0) = / / S(t,r / ,0>U(r,r',0-0 / )^0' (16) 

Jr min JO 

/Tmax /■27T 

^(r,0) = / / E(t ) r , ,0> , ^(r,r , ,0-0 / )^ , #' (17) 

Jr min JO 



Since we have two field components, \l/ r and the integral method takes twice the 
computation time of the difference method. But, we expect that the integral method might 
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have higher accuracy than the finite difference approximation. However, if the mode cut- 
off approach, which will be described in Section 13.21 is used, it is possible that the integral 
method may lose this advantage. In fact, we observed in our numerical tests that the integral 
method of (|T6j) and (fT7|) with the mode cut-off gave larger errors and more than double the 
computation and communication times of the finite-difference method. 



2.4. Tree-code force calculation 

Yet another approach we explored to solve for the field involved using a hierarchical 
tree-code. In this approach, we treat each cell as a particle, and hence an effective force law 
between two points, (r, 0) and (r', (/)'), in the disk that accounts for the vertical structure is 
required. Just as before, we assume p(x) = S(r, <p)Z(r, z). And since symmetry arguments 
require Z(r, z) to be an even function of z, the z-component of the force must be zero at the 
z = plane. Therefore, the force is directed in the plane of the disk and it has a magnitude 
factor given by 

= r r z^w^i 

J-oc J-oo {(P + {Z- Z'ff 2 

where d = (r 2 + r' 2 — 2rr'cos((f) — 0')) 1 ^ 2 . 

If we further assume Z(r, z) is given by Eq. (j3J), then we can rewrite Eq. ffT8l) as 

F(d;H,H') = CMd ^ H,) (19) 

where 

t3 roo poo -z 2 /2H 2 -z' 2 /2H' 2 

CMd ; H, H') = _ (20) 
is called the 3D correction factor with respect to the 2D force w i thout the vertical structure. 



As H -> 0, ([20]) approaches to the 3D factor defined bv iKollerl fl2004h . 



We note that this force law is a line-line force (giving the net interaction between the 
mass distributed along two vertical lines at (r, 0) and (r',0')). This is different from what 
is done in the Green's function method, which uses a point-line interaction. By setting 
H = in Eq. f|T9l) . we can recover the interaction force law of the Green's function method. 
However, if we wished to include line-line interactions in the Green's function method, we 
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would have to abandon the more efficient potential calculation of Eq. ([6]) and instead use 
Eqs. ([TBI) and (TT7]) with C/ r and defined appropriately. 

Re-scaling the integration in Eq. (1201) and introducing the non-dimensional variables 
d ' = jm+m and " = ir g ives 

roo roo e -z 2 /2 e -z' 2 /2 

C 3D (d';a) = — / ^dzdz (21) 

27T ./_oo J-OO f , /2 (£-£^0)2 \ 

V 1+a 2 



Finally, we perform an orthogonal coordinate transformation using the variables 

C = /, , 2 ( z - z ' a ) 



a 



£ = ^=={az + z>). 
V 1 + a 



This allows us to simplify Eq. ( T2T1) to 



roo roo -( 2 /2 p -(' 2 /2 

™*=*LLw^« (22) 



revealing that it is actually independent of a. In fact, similar to Eq. (J6j we can express Eq. 
(122)) as 

d' 3 



C m {d>) = --4=e d ' 2 / 4 (^o(rf /2 /4) + K(dVl)) (23) 

ZV Z7T v ' 

where K and K' Q denote the modified Bessel function of the second kind and its derivative. 



The tree-code requires the calculation of CzD^d') and its first and second derivatives 
over a wide range of d' (ranging from 1CT 2 to 10 3 ). Calculating Eq. (|23|) on the fly is 
prohibitively expensive and should be avoided. Another option is to pre-compute Eq. (1231) 
and its derivatives at a discrete set of d' values. Then when C^^d') is needed, we interpolate 
the pre-computed data. This approach can provide a significant speedup, but because the 
data is stored in main memory it is still too slow for our purposes. 

Another approach we explored involves approximating Eq. f[2"3"j) with a model function 
that is accurate yet inexpensive enough to compute on the fly. Plotting C^oid') reveals its 
form (Fig. [lj and suggests a model function of the form 

r(m o dcl) U] _ (d>/d> 1/2 y + (d>/d' 1/2 )* 

{d} -2 + (d'/d' 1/2 y + (d'/d' 1/2 y P - {M} 
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non-dimensional distance non-dimensional distance 



Fig. 1. — 3D correction factor (Eq. (|23l ) and relative error plotted vs. non-dimensional 
distance (<f ) using model Eq. ([2IJ). 




non-dimensional distance non-dimensional distance 



Fig. 2. — 3D Force factor (Eq. ( |T9j) ) and relative error plotted vs. non-dimensional distance 
(<f) using model Eq. (|2"3|) . 



We find optimal parameter values to be dy 2 = 0.8252 and p = 0.957. The relative error 
introduced by using this model is plotted with C%d and F in Fig. [T] and [3 Since generally 
d! = d/H(r) > e/H(r) with H' = 0, the model is acceptable for distance or softening larger 
than 0.005if (r). This model requires the calculation of one power function, a few additions 
and multiplications and one division. For our application, we found this to be the most 
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efficient way to include 3D structure using a tree-code. 



Our tree-code is implemented in a distributed memory parallel architecture using MPI, 
just like o ur Green's func t ion FFT algorithm. There are m any papers detailing parallel 



tree-codes (jDubinskil Il996l ; iMiocchi fc Capuzzo-Dolcettal 120021 ). So we simply highlight the 



important points of our algorithm here. 



1. We begin by pre-computing the tree structure of our grid, which does not change 
with time. Each processor retains a copy of this quad-tree to be used throughout the 
simulation. 

2. All processors compute partial sums (using local particles) of the moments for all nodes 
in the tree. 

3. These partial sums are added and broadcast back to all the processors. 

4. A complete copy of the density profile is distributed to all the processors. This is 
required for load balancing and computing the direct summation part of the force 
(step 5(c)). 

5. Forces are computed at every grid cell. The disk is partitioned azimuthally (the sym- 
metry providing ideal load balancing), and processors are assigned equal-sized sectors 
of particles (grid cells). Each particle traverses the quad-tree to include interactions 
with all other particles. 

(a) If a particle-node interaction meets the accuracy criterion, then the particle in- 
teracts with that node via the moments computed in step 3. 

(b) If the accuracy criterion is not met (the node is too large or the particle-node 
separation too small), then we drop one level in the tree and check all the rejected 
node's sub-nodes for particle-node interactions. 

(c) Last, if we reach a leaf-node (a node with no sub- nodes) and the accuracy criterion 
is not met we perform direct summation over all the leaf-node's particles. 
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3. Numerical Implementation and Experiments 



3.1. Description of the test problem 



As in 



Chan et al 



by 



(120061 ). we consider the density function to be a Gaussian sphere given 

p(r,(j),z) 



M ( r 2 + z 2 



( 27r(T 2)3/2 ex P ^ 2a 2 

where M is the normalized total mass, a is a parameter that controls the width of the mass 
distribution and it has si milar role as the sc ale height in Eq. (j3J). The potential on the z = 

(120061 ) 



plane has been given by 



Chan et al. 



V>(r,0) 



where erf(x) is the error function 



erf(x) 



1 / r 
-ert —=- 
r \^2a 



e _t dt 



(25) 



For a collection of Gaussian spheres centered on the z = plane with the same a, the 
z-dependent vertical structure can be factored out, i.e., 



p(r, (f),z) = Y^ P{n,4>i) ( r > 0; z ) = Z ( r > z ) 



where (r^, 0j) is the center of each sphere, Z(r, z) gives the vertical structure 



Z(r, z) 



V2 



exp 



TIG" 



Hrj-.^A is the surface density 



exp 



2a 2 



(26) 



2<r 2 



and Ri(r, <p) = y r 2 + r^ 2 — 2rr' cos(</> — (pi) is the distance between (r, (j>) and (rj, (j>i). Notice 
that Eq. (|26|) is the same as Eq. (jl]) with a constant scale height H{r) = a. 



We use three Gaussian spheres located at (r, 
a total mass of 2,1/2, and 1 respectively, i.e., 

1, 



1, 0), (0.9, 3tt/4), and (1, — tt/2) with 



S an a(^0) = 2£ (1;O) (r,0) + -£(o.9,37r/4)(?", (j)) + E(i_ w / 2 )(r, 



(27) 
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where (*)ana represents the analytic solution. The exact potential will be 

^ana(r, <p) = 2i(;{R m ) + ~V(#(o.W4)) + ^(#(1,-^/2)) (28) 
where ip is defined by Eq. ( I25l) . The exact field \l/ r and can also be calculated accordingly 

The tests are done in the computational domain [0.4, 2.0] x [0, 2ir] with grid N r x N^. 
The normalized total mass for the disk is M = 0.002M*. We choose = AN r so that the 
cells near r = 1 are nearly square. Without specification, we always use N r = 800, which 
reaches the convergence in both the torque calculation on planet and the azimuthal averaged 



poten tial vorticity distribution on the disk in our disk-planet interaction simulations (ILi et al. 



20051 ). Since the surface density (|2"7|) approaches to zero outside the computational domain, 
the exact potential fl28l) is still valid for our truncated domain. For convenience, we use the 
following notations for comparison: N p denotes the number of processors, -E max denotes the 
maximum error over the whole domain, RE max denotes the maximum relative error, and p 
is the convergence order in E max . Since the force is a vector that has two component, we use 
the following formula to calculate E m3jX 



-^max — \j\f r /anal 2 + 1/^ /anal 2 

where f r and /* are the force components in r- and 0-direction respectively. The local 
relative error for a variable u is defined as RE ma , x (u) = E max /\u&n&\. For the force, |«ana| — 
(/ana) 2 + (/ana) 2 - The global relative error for a variable u is defined as 

RE{u) = U=l tT l=l V'° ^ (29) 
2-, i=\ l^i=i Pi,j,ana| 

All of the computation are performed on a parallel Linux cluster at the Los Alamos 
National Lab. Each node of the cluster is dual-core AMD Opteron(tm) processor with 2.8G 
HZ and 2GB local memory. 



3.2. Parallel implementation and comparison 



Our hydro simulation for the interaction of disk and proto-planet problem is performed 
on a high resolution grid, e.g., N r x = 800 x 3200 grid. The whole domain is split into 
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annular regions for parallel computation. Each annular region has the same number of cells 
in the radial direction. 

The Fourier transform and its inverse can be parallelized without much difficulty be- 
cause they require no communication between different processors. However, the numerical 
quadrature of Eq. (Q is done in the radial direction, where each processor holds only part 
of the information about the density distribution in the whole domain. 

We have tested two approaches to parallelize the numerical quadrature. In the first 
approach, we start by computing a partial quadrature including only source terms from the 
local density distribution, i.e. each processor computes a partial sum of Eq. Q for all r and 
m. We then apply a global communication to obtain the complete summation. 

In the second approach, we begin by performing a global communication over the whole 
domain so that every processor has a complete copy of the density profile for r' G [r min , r max ]. 
Then the complete numerical quadrature can be performed to obtain if? m (r) for the local 
portion of r grid. 

We remark that in the pre-computing stage, no matter which approach is used, each 
processor can compute its own portion of the Fourier transform I m (r,r') independently and 
store it for the later use. It does not need to have a copy of whole profile for all r and r'. 

In the following we will first propose a strategy to reduce the parallel communication 
cost, and then compare the above two approaches in calculating potential \l/(r, 0). For 
simplicity, we denote the first approach as approach I, and the second as approach II. 

3.2.1. Mode cut-off approach to reduce the communication cost 

The communication cost in both approaches is proportional to both the number of 
processors and the amount of data being communicated (total grid size). Consequently, we 
expect that the communication cost will dominate the computation cost as the number of 
processors is increased, eventually becoming a bottleneck for a large number of processors. 
Note that the communication is done in the frequency domain, where the high modes decay 
exponentially for a smooth function. Thus we can truncate the Fourier modes above a cutoff 
parameter (M cut ) without a significant loss of accuracy. If M cut is far smaller than N$ (the 
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total number of mode), the communicate cost can be reduced to a small of fraction of the 
cost before the cutoff. 

For both approaches, we can calculate M cut based on the Fourier transformation of 
the density profile. For a specific radius r^, we define the energy contained in as 
J2m=o |S m (rj)| 2 . Each processor then computes an m cut {r) based on satisfying some preset 
fraction e cut (e.g., e cut = 10 -4 ) of the total energy for modes above m cut (r). Next, each 
processor finds its maximum m cut over its range of r. Last, all the processors compare their 
m cut values to obtain the global maximum M cut that is used to truncate the Fourier series. 
Note that M cut calculated in this way does not vary with the number of processors. 

For approach I, the m cut (r) can also be evaluated based on \&(r) instead of £(r), because 
the communication is done after the partial summation. To be more accurate in the field cal- 
culation, we instead can evaluate m cut (r) based on the energy defined by ^^=0 l m ^m( r «)| 2 - 
The extra factor of m is included so that we obtain the energy of the force, which is the 
derivative of Notice, however, that if we compute m cut and M cut as described above, M cut 
will vary with the number of the processors. This is because each processor computes only 
a partial sum of ^f m (r). To produce a relatively constant M cut , after each processor finds its 
maximum m cut , all the processors take an energy- weighted average of their m cut to obtain 
the global M cut . It is observed that the energy- weighting average gives an M cut that remains 
nearly constant while the number of processors varies between 1 to N p . For verification 
reason, we can require that M cut be independent of the number of processors. However, 
that requires a costly global communication between different processors. Nonetheless, we 
find that M cut will only increase slightly with the number of processors, which means that 
increasing the number of processors will not degrade the numerical accuracy. We define a 
similar preset fraction e cu t to the previous density mode cut-off approach. After extensive ex- 
periments, we observe that e cut « 10~ 7 in \&(r) cut-off produces similar results to e cut = 10~ 4 
in S(r) cut-off. 

For density distributions in disk-planet simulations, the energy of the zero mode (m = 0) 
often dominates that of all other modes combined. Thus, when calculating the energy using 
£ or ^ , it is convenient to exclude the zero mode from the total energy. Notice that if 
the energy is defined as \m^/ m (r)\ 2 , the zero-mode is naturally excluded. To remove the 
numerical noise, we multiply the energy of the zero mode by a small number (e.g., e) and 
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add it to the total energy. In our disk-planet simulations, the initial density distribution 
is axisymmetric, and hence zero mode is enough. With the time evolution of the surface 
density, M cut begins to increase quickly until it reaches a certain number, which is insensitive 
to different resolutions and different initial power-law density profiles. However M cut does 
vary a lot with the planet mass and sound speed. For an example, we obtain the following 
data for simulations of an isothermal disk with different planet mass (M p ) and sound speed 

(c a ). 

{190, for M p = 3 x 1(T 5 M*, c s = 0.05 

297, for M p = 3 x 10- 4 M*, c s = 0.05 

298, for Mp = 3 x l(r 6 Af*, c s = 0.035 

Note that M cut is much smaller than the total number of modes in ^-direction, = 3200. 



3.2.2. Impact of the cutoff modes and cut-off thresholds 

Here, we study how the number of cut-off modes, M cut , impacts the computational 
efficiency and accuracy. We test the problem with parallel computation using 100 processors. 
We will show the results only for approach II. Similar results have also been obtained for 
approach I. 

Fig. [3] shows the variation in the computation and communication time with different 
numbers of cut-off modes. The total cost displayed in term of CPU time includes the com- 
putation of one FFT of the density field, the global communication of the density spectrum 
up-to the cut-off mode M cut right after the FFT, a quadrature calculation in the radial di- 
rection. Fig. [3] shows that the communication time increases at a slower rate with M cut than 
the computation time . 

The energy cut-off approach depends on the cut-off threshold e cut . We also vary the 
size of e cut to see how it impacts the cut-off mode. Fig. 0] shows the simulation results for 
different e cu t. Figs [3] and H] show that our cut-off threshold, e cu t = 10 -4 , which corresponds 
to Mcut — 153, appears to be the optimal number to balance accuracy and efficiency for this 
problem. This energy cut-off threshold is insensitive to grids with different resolutions. 
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Fig. 3. — The computation and communication cost for evaluating \l/(r, 0) (left) and the 
error variation (right) as a function of M cut . 
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Fig. 4. — The computation and communication cost for evaluating if>(r,4>) (left) and the 
error variation (right) as a function of energy cut-off thresholds e cu t- The cut-off is based on 
the energy of S. 



3.2.3. Parallelization via domain decomposition of the source grid: approach I 

The approach I is implemented via domain decomposition of the source grid r' . In 
terms of the communication cost, it involves a global reduction of N r M cut data from every 
processor and a distribution of N r M cut /N p data to each processor, where N p is the number 
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of processors. Here we use a fixed mode cut-off number M, 



cut 



153. 



Fig. [5] shows how the communication and computation cost vary with the number pro- 
cessors. Note that the communication cost remains relatively constant no matter how many 
number of the processors we use. This is a surprise, because we expect that the communi- 
cation cost is proportional to the length of the communicated data, which is proportional to 
the number of processors. We remark that one might obtain a different performance for a 
different MPI implementation or a different parallel cluster. In year 2007, we observed using 
the same code that the communication cost increases linearly with the number of processors. 
In year 2008, our parallel cluster has been upgraded with a new parallel software and a new 
inter-connection. The data shown in Fig. is calculated on the new cluster. 
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Fig. 5. — The communication and total cost for evaluating \l/(r, <fi) varying with the number 
of processors in MPI. M cut = 153. 
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3.2.4- Parallelization via domain decomposition of the field grid: approach II 

The approach II described above is implemented via domain decomposition of the field 
grid r. It involves a global gathering of N r M cut /N p data from every processor and a broadcast 
of N r M cut data to each processor. While the amount of communicated data is the same for 
both approaches, approach I also involves a global summation operation resulting in a total 
communication cost of about 10% more than approach II with the same value M cut . Again 
we show some results using only a fixed cut-off mode number M cut = 153. 

FigIS] (approach II) shows the performance comparison with the first approach. Similar 
to the results of approach I described in the last subsection, the communication cost remains 
nearly constant. It is clear that approach II (this approach) is faster than the approach I. 
Therefore we will use approach II as our choice of the methods in the tests hereafter. 

Fig. [6] shows the parallel efficiency for different numbers of processors (N p = 2 n , < 
n < 7) and grids with different resolutions. The parallel efficiency is defined by 

E(N ) = ^ seq ^ 
1 p) N p T(N p ) 

where T(N p ) is the run time of the parallel algorithm, and T seq (l) is the runtime of the 
sequential algorithm using one processor. For the fine resolution grid 1024 x 4096, we replace 
^seg(l) with 4T(4) due to the memory limitation of the processors. It is clear that for a 
smaller number of processors and a larger grid, the parallel efficiency is better. Fig. [6] shows 
that the parallel efficiency can be larger than 1 at certain stages. The reason is not clear to 
us. 

3.3. Comparison of different approaches for potential calculation 

We have tested two approaches for computing the potential and field: one is with the 
softening Eq. (flOl) and the other is without softening but with a shifted grid defined by Eq. 
(Till . Note that the field is calculated differently using different central-difference methods. 
The field force with softening is calculated using central-difference Eqs. (1121) and (1131) . The 
field force without softening is calculated using central-difference Eqs. (fT4"l) and ffT5l) . 

To minimize the impact of the cut-off mode, we use a fixed cut-off number M cut = 153, 
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number of processors 

Fig. 6. — The parallel efficiency for grids with different resolutions and different number of 
processors. M cut = 153 is used. 

which produces nearly the same results as without cut-off (see §3.2.2p . Table [1] shows the 
numerical errors for both approaches. The convergence order is calculated based on the 
maximum absolute error. The results show nearly second-order convergence as the mesh is 
refined. This is in agreement with the accuracy of our method, because both the quadrature 
rule to calculate the potential and the finite-difference method to calculate the force are of 
second-order accuracy. 

Based on global relative errors (RE in Table [T]) for both potential and field, we see that 
the potential method using the shifted grid without softening is more accurate than with 
softening. Yet the difference is relatively small. 
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Table 1: Numerical results for the potential calculation with and without softening Eq. ( fTOl . 
The top half is for with softening and the bottom half is for shifted grid without softening. 





Force error 




Potential error 


Grid 




RE 


V 




RE 


V 


iV r 




(10~ 2 )(1) 




(io- 2 ) 


(io- 5 ) 




128 


2.6514 


3.1388 




3.2631 


7.2648 




256 


0.7922 


0.9205 


1.74 


1.7595 


3.0652 


0.89 


512 


0.2317 


0.2983 


1.77 


0.6925 


1.4737 


1.35 


1024 


0.0725 


0.1298 


1.68 


0.2485 


0.9461 


1.49 


128 


3.3427 


2.4634 




5.5060 


5.1819 




256 


0.8539 


0.6580 


1.97 


1.4317 


1.8465 


1.94 


512 


0.2106 


0.2105 


2.02 


0.3775 


0.9995 


1.92 


1024 


0.0592 


0.1036 


1.83 


0.1115 


0.7909 


1.76 



3.4. Tests of point force calculation 

In the previous subsection, we have tested different approaches for disk self-gravity 
computed at the cell centers. In this subsection, we describe and compare methods to 
calculate the gravitational force of the disk on the planet at any point (r p ,0 p ). We assume 
a unit planet mass at that point. 

We investigated the force calculations in three different ways. In the first approach, we 
performed a direct summation of Q r and from the point (r p , <p p ) to each of the cell centers 
[Eqs. (flBT) and (fT7|) with (r, 0) = (r p , <fi p )}. This is similar to a particle method and is widely 
used in calculating the disk force exerted on the planet in the disk-planet simulations. In the 
second approach, we applied bilinear interpolation to the force computed at the cell centers. 
In the third approach, we applied a finite difference formula directly to ^> at the grid cells 
closest to the point location. This approach is very similar to bilinear interpolation of the 
force, only it uses a smaller grid stencil, which gives better accuracy. 

To calculate the force at a specific point and account for the fact that the point can be 
at an arbitrary location within a cell, we take a cell that contains point (r p , P ) = (0.99, 0.0), 
which is close to the center of one of the first Gaussian sphere, and compute the force for 
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11x11 equidistributed points within the cell. Since the third approach is similar to the 
second approach and verified to be more accurate, we only compare the first and third 
approaches. 

We use the softening defined by (fit)]) in the direct summation approach to avoid singu- 
larities. Fig. [7] shows the absolute error and relative error throughout the grid cell. We find 
that the maximum error and maximum relative error are 6.12 and 16.6%, respectively. The 
top two plots of Fig. [7] show that, although softening (fTOl) works very well for the disk self- 
gravity calculated at the cell-center and point force at the node- and edge-centers, it gives 
large error for the point force at other locations. The relative error range is [0.0152%, 16.6%], 
which means the direct summation with this softening is very sensitive to the location of the 
point. We have tried two other different softenings. The middle two plots of Fig. [TJshow the 
results with softening e = min(Ar, rA0), which approximates one grid spacing. Both the 
maximum error and maximum relative error are much reduced. Also the range of the relative 
error becomes [0.63%, 0.88%], which means the direction summation with this softening is 
relatively insensitive to the location of the point. The bottom t wo plots of Fig. [7] show the 



results with softening e = 0.3H(r), which has been adopted by iBaruteau fc Massetl (120081 ) 
and corresponds to seven grid spacings in our 800 x 3200 grid layout. Both the maximum 
error and relative error become large everywhere. The error range is [6.04,7.40], and the 
relative error range is [16. 5%, 16. 6%]. 



The third approach using finite difference on \l/ achieves much high accuracy, with maxi- 
mum error and maximum relative error of 0.0315 and 0.078%. Fig. [H]shows the absolute error 
and relative error throughout the grid cell. The relative error range is [0.0264%, 0.0778%], 
which means that this approach is relatively insensitive to the location of the point. 

We remark that in the actual disk-planet interaction simulations, the accuracy of the 
force on the planet also depends on how accurately the disk density around the planet is 
resolved. Since the disk within one Roche lobe is usually not well resolved, the third approach 
in the point force calculation, though by itself is very accurate, may not always give the most 
"accurate" force calculation if the disk density is poorly determined. In fact, we find that 
the direct summation approach with a relatively large softening, e .g. e = O.lH(r), gi ves 



results that are in good agreement with the linear theory results by iTanaka et al.l (120021 ) . 
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Fig. 7. — The accuracy of point force calculation by direct summation approach with different 
softenings: softening (1101) (top two), e = min(Ar, rA0) (middle two), and e = 0.3H(r) 
(bottom two). The left plots show the magnitude of the errors and the right plots show the 
relative errors, both throughout one grid cell. 
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x 10"" 




Fig. 8. — The accuracy of point force calculation finite-difference of the potential. The 
left plot show the magnitude of the errors and the right plot show the relative errors, both 
throughout one grid cell. 

3.5. Tests of the tree-code force calculation 

Here, we test the performance of our tree-code method described in Section l2T4l First, 
we test the accuracy with the same error and convergence measures used in section I3.3I 
For simplicity, we use a fixed softening e = a(l)Ar = 0.23Ar throughout the grid. The 
approximate model (124I) is used to calculate 3D correction factor. To obtain the field in the 
z = plane, we set H = in Eq. (I20I) resulting in an interaction force consistent with the 
Green's function method. Also, we use the same density function with known analytical force 
field described in Section |3j Second, we test the parallel efficiency by varying the number of 
processors for a highly resolved grid (800x3200). 

From the results given in Table [2] we draw the following conclusions. First, this method 
does not appear to converge to the exact solution. This is actually expected since the model 
force function (Eq. (j24|) ) with our chosen softening does not become more accurate as the 
grid is refined. We observe that the maximum error asymptotically approaches to 1.0, which 
is much larger than the error using the Green's function method for highly resolved grids. 
Next, we see that the time complexity scales approximately with 0(N r N ( j ) log(N r N r f ) )) , which 
is consistent with the theoretical prediction. Finally, we should remark that the softening 
has a large impact on the accuracy of the solutions for the tree-code. If e = 0.3H(r) is used, 
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Table 2: Accuracy and efficiency data for tree-code force calculation on Np=100 processors. 



Grid 


p 

- 1 max 




RE(1Q~ 2 ) 


V 


total CPU Time (s) 


100 x 400 


171.68 


5.617 


1.5216 




0.1078 


200 x 800 


171.72 


2.324 


0.8452 


1.27 


0.6844 


400 x 1600 


171.73 


1.390 


0.6291 


0.74 


3.7167 


800 x 3200 


171.74 


1.114 


0.5463 


0.32 


16.543 


800 x 3200* 


171.74 


0.08912 


0.0126 




0.02093 



"The last line is for Green function method, where the C 



'U time does not include the 



pre-computing time, which takes 1.001 second. 



Table 3: Parallel efficiency data for tree-code force calculation on 800x3200 fixed grid. 





CPU Time 




N p 


total time 


(s) comp. time (s) 


Parallel efficiency 


10 


182.32 


180.97 


0.99 


20 


92.78 


91.83 


0.98 


40 


46.92 


46.06 


0.96 


50 


37.71 


36.85 


0.96 


80 


24.07 


23.05 


0.94 


100 


20.07 


18.94 


0.90 


160 


13.80 


12.25 


0.82 


200 


11.28 


9.49 


0.80 



the maximum error and global relative error become 20.57 and 3.045% respectively. 

Parallel efficiency results can be seen in Table [3j Although we find our tree-code scales 
well with the number of processors, the actual value of the CPU time is more than two orders 
of magnitude greater than our Green's function method! We have experimented with tuning 
tree code parameters and other memory optimizations and have found that while it may be 
possible to gain a factor of 2 — 4 in speedup over the results of Tabled our tree-code method 
always performs much slower than our Green's function method. 
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4. Conclusion 

In this paper, we have presented a fast and accurate solver to calculate the potential and 
self-gravity forces for the disk systems. This method is implemented on a polar grid, and FFT 
can be used in the azimuthal direction. The pre- calculation of the Green's function and its 
FFT play a major role in the algorithm to reduce the computational cost. We think it could 
be the main reason why the Green's function method is much faster than the particle tree- 
code method. We also presented an efficient method in implementing the solver on parallel 
computers. We find the computational cost for the self-gravity solver to be comparable to 
that of the hydro solver for large, highly resolved grids run on a distributed memory parallel 
architecture. We also developed a 2D tree-code solver, which uses a relatively inexpensive 
model force to accurately account for the vertical structure of the disk. 

Finally, we notice that if the disk vertical structure varies with time, our pre-calculation 
must be done every time step making our solver inefficient. 

We have applied our self-gravity solver to simulations of disk-planet interaction system. 
Compared with the simulations without self-gravity, the total computation time is increased 
by only 30% for a parallel computation with 100 processors. We have also confirmed that 
the 2D self-gravity indeed accelerates the planet migration. These results will be reported 
elsewhere. 
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